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Introduction 


With the increased complexity of Field Programmable Gate Array (FPGA) technology, users are now able 
to utilize them to implement System On a Chip (SOC) applications. A design's state space consists of a 
combination of its hardware and software. Because of the large number of gates, the modes of operation, 
and amount of software that are contained within SOCs, they have a tendency to be incredibly complex 
solutions. However, this complexity makes it quasi-impossible for a customer to verify such products within 
near 100% fault test coverage, due to limitations such as: time, verification tool constraints, memory 
restrictions, and available tester speeds. 

In order to increase test coverage, the ASIC industry has developed Design For Test (DFT) 
methodologies [1]. However, such schemes have not been fully embraced by the FPGA community and test 
coverage remains an issue. When characterizing single particle radiation response within these devices, the 
goal is to compare, via test processes, normal operational response versus ionizing fault response. 
Obtaining the ability to observe single particle radiation-induced faults in a SOC increases the intricacy of 
the test requirements exponentially. Given the inherent restrictions in test coverage within normal 
operational environments, it becomes unrealistic to aim for a full radiation-response characterization 
covering all possible states of such products. However, an effective analysis must be performed to 
accurately determine project specific risk reduction techniques. Traditional qualification for space 
approaches may no longer be valid for contemporary, complex integrated circuits unless unrealistically large 
sample sizes and particle fluences are utilized. The goal is therefore to constrain the targeted state space to 
a level that will provide acceptable information to qualify the SOC operability in space. 

In this paper we will present the qualification methodology we have applied to one SOC: the Xilinx Virtex4 
XC4VFX60-SOC as implemented in the NASA Space Cube targeted for the Express Logistic Carrier (ELC) 
mission. ELC is a carrier to transport equipment and material to and from the International Space Station 
(ISS). The Space Cube utilizes a redundant Power PC topology within the FX60 to perform several data 
processing functions in a space radiation environment. Within this discussion, we will also present a 
synopsis of the NASA High Speed Digital Tester (HSDT) [2] that contains an "all-in-one" custom designed 
Virtex-4 Configuration Manager, Scrubber, Fault Injector, and read-back manager. 
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Constraining the Design State Space - A Definition 


A synchronous design’s state space is related to the number of D flip-flops (DFFs) utilized. In its simplest 
form, the upper bound of the design’s possible state space is represented as: 

State Space = 2 n (n : number of DFFs implemented within the DUT) [3] 

Combinatorial logic is not considered as part of the equation because synchronous designs are defined 
by clock periods. At the end of each clock period the circuit is settled in one of 2n possible DFF states. 
Design’s that are not effected by noise or radiation in most cases are bounded by a much smaller subset of 
the 2n upper bound. However, when noise or radiation is a factor, any state is reachable. 

Due to time, money, and physical resources, it is impossible to test for every condition that can cover the 
design’s entire state space, when performing Single Event Upset (SEU) radiation testing of complex circuits 
such as SOCs. It boils down to research vs. application, and hence, SEU tests have been divided into two 
categories: 

(1) Device Characterization (research driven): error rate calculations of device primitive circuits such as 
(but not limited to): DFF’s, inverters, buffers, Look-up Tables (LUTs), I/O, and configuration memory. 
Accurately determining error cross sections for multifaceted device primitives can take years due to the 
complexity of determining each elements contribution to error cross sections. Fault Masking and element 
cascading creating non- linear effects and dependencies are major contributors to the necessity of 
developing large test sets that can hone in on particular elements and accurately measure their single event 
responses. 

(2) Design Characterization (application driven): targets a specific design under test. Such calculations, 
when constrained and analyzed correctly, may only take months. Element observability is minimal. 
Therefore, designs should be specific to the application under investigation. The objective is to determine 
the strength of a given design in a space environment. 


To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 


Setting Goals and Specifications for ELC SEU Testing 


The requirement was to supply the ELC mission with a radiation characterization of the Xilinx Virtex4 
XC4VFX60-SOC with dual-core embedded PowerPC processors. Due to the stringent schedule constraints 
required by the project, design characterization of the Space Cube was considered. However, due the 
complexity of this processor based design, its state space coverage had to be defined and constrained. In 
order to constrain the SOC’s state space, a strong understanding of the targeted device and the design’s 
infrastructure was essential. To be concise, the objectives were: 

(1) To develop a design under test (DUT) that was compatible with the actual design targeted for the ELC 
mission 

(2) To constrain the complex state space such that the design’s characterization was informative and a 
good representative of the actual flight project 

(3) To observe and compare possible radiation hazard responses 

(4) To determine an appropriate fault cross-section metric in essence to supply the mission with 
qualification data. 
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Synopsis of the EL C SOC Design 


The Space Cube processor card is populated with two Virtex-4 FX60 devices yielding a total of four 
PowerPC processors. Each processor is allocated 50% of the FPGA fabric and is considered an 
independent processor node. All four processors run independently of each other, and results are voted on 
by a separate rad-hard FPGA. The rad-hard FPGA is also tasked with trapping error conditions and flagging 
which processor node needs immediate attention. A combination of internal and external interaction will 
bring the malfunctioning processor back online and resynchronize its tasks with the other processors. 
Different approaches are being considered to bring the processors back online (ex. warm reset, full re-boot, 
partial reconfiguration.). Radiation test results will hopefully aid Space Cube team to determine what 
procedure is needed as well as what mitigation is the best to keep processors functioning as long as 
possible (i.e. scrubbing, Internal Configuration Access Port (ICAP) hardware controller, some combination of 
the two, etc...) to bring the processors back online (ex. warm reset, full re-boot, partial reconfiguration, etc.). 
Figure 1 shows the block diagram of the embedded PowerPC and the major FPGA interfaces. The current 
hardware system interfaces to the processor using Xilinx specific Processor Local Bus (PLB). All 
instructions and data are stored in external RAM. 
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Figure 1: ELC Power PC Block Diagram. 
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Partitioning and Selecting the Functional 
State Space for Testing 


Partitioning Scheme 

< 88 # 

The functional state space was partitioned as follows: 

(1) CPU (and its interfaces) along with Cache Units and 

(2) MMU and Timers/Debug Logic. 

Partitioned logic will be tested in stages. This discussion focuses on a portion of stage one: the CPU and 
its interfaces. 

Design Under Test 
General 

As previously stated under ELC objectives for a design characterization, the selected test structure 
should be representative of application. Fig. 2 shows a block diagram of the processor node test design. 
This figure only depicts half of the Virtex-4 FX60 implemented as the Design Under Test (DUT) unit. The 
processor node design is instantiated for both processors and has independent control lines from the tester. 
In addition to this processor design, the radiation test design includes a large shift register to exercise the 
logic part of FPGA (used for latch-up testing). 

Custom Highspeed Peripheral 

The Custom High-Speed Peripheral (CSHP) (illustrated in Figure 2.) is an instantiated Memory IP core: 
8x32bits. This is a novel approach to high-speed SEU Power PC testing. The processor writes the CSHP 
as it is writing a 32 bit memory location. However, there is no memory on this port. It is actually, the HSDT 
posing as memory and grabbing data. Data rates are 64 MHz by 32 bits, however, accounting for memory 
write overhead reduces this throughput by a factor of 4. Therefore data rates through the CHSP are 512 
Mb/sec. 
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Partitioning and Selecting the Functional 
State Space for Testing (Cont.) 
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Figure 2: Single Processor Node DUT design. 
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Defining EL C Test Requirements 


In order to create a test vehicle with a sufficient number of observable points of the design’s state, the 
tester must contain a large number of high-speed I/O that connect directly to the DUT. The Xilinx Virtex4 is 
a SRAM based FPGAfor configuration storage. It therefore becomes necessary to be able to scrub (correct 
errors in) the configuration memory (or re-write the erroneous bits). 



Figure 3: HSDT and XC4VFX60-SOC Test Vehicle. 
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HSDT to DUT Interface - General Description 

Due to the numerous amount of available high-speed I/O, the HSDT was utilized for SEU radiation testing 
of the XC4VFX60-SOC. The HSDT controls IRQ, clock, and reset inputs to the DUT. It is also the 
responsibility of the HSDT to grab data from the DUT via the CHSP or the UART interfaces. The purpose of 
the DUT’s CHSP from the perspective of the HSDT is to create a high-bandwidth bus to the DUT on which: 

(1) The tester can control the DUT 

(2) Tthe DUT can transfer dynamic (high-speed) data to the HSDT 

(3) The tester can send instruction code to the DUT configuration SRAM (through a custom made 
SRAM CNTRL unit that can MUX between processor control or HSDT control). 

Fault Injection 

The third concept is a novel approach and is also used for real-time fault injection controllable by the 
tester into the instruction and or Data path. The test software application code is loaded and executed out of 
external RAM. 

Frequency Considerations 

In accordance to emulating a design representative of application, the DUT processors were exercised at 
the same speed as proposed in the ELC mission - 250 MHz. Data acquisition was implemented in 2 
separate Categories: 

(1) “Ping Pong” - Interrupt driven counter increment followed by a transmission to the HSDT. Time 
between Interrupts was varied. 

(2) Constant data acquisition: Counter was incremented every PC cycle and sent to the HSDT via the 
CHSP 
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Defining EL C Test Requirements (Cont.) 


Scrubbing 

ELC plans to utilize the Xilinx ICAP and Frame ECC cores for scrubbing the XC4VFX60. Such cores are 
DUT internal (unhardened) scrubbing mechanisms. To determine the effectiveness of this scrubbing 
methodology a comparison of the internal scrubber vs. an external scrubber that can potentially be hardened 
was performed 

External Tester Control of the Select Map Interface 

The HSDT stores the DUT’s configuration bit file within onboard SRAM. The user is able to control 
configuration, reconfiguration, scrubbing, fault injection, and read-back from the HSDT console. Fault 
injection was used to assist with error analysis and scrubbing verification. 

Internal Scrubbing : ICAP anti FRAME ECC 

Knowing that the ICAP/FRAMEJECC design is based off a single error correct, double error detect 
(SECDED) correction scheme, both single and multiple bit frame errors were injected byvia the HSDT. All 
single bit errors were corrected. Double bit errors were not corrected, but detected as expected. However, 
there were multiple bit errors that went undetected and some false corrections were also detected. This is of 
interest because it has been observed that multiple errors are a major concern within the V4 configuration 
memory upon radiation exposure [3]. We irradiated the ICAP design utilizing our custom scrubber under 
heavy ION beam at Texas A&M University Cyclotron (TAMU) on February 17th-20th 2007 for further 
comparison studies. Please see Figures 4 and 5 for results. 
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Test Facilities - Proton 

Testing was performed at heavy Ion and proton facilities. This presentation will only consider proton SEU 
data. 

Facility: Indiana University Cyclotron Facility. 

Energy: 93 MeV and 200 MeV 

Flux: 1e7 protons/cm 2 /s 

Fluence: All tests were run until Single Event Functional Interrupt (SEFI). 

It had been decided that flux will be kept extremely low in both proton and heavy ION test facilities. This 
decision was made based off of: 

(1) Previous SEU testing experience with SRAM based FPGAs 

(2) the fact that the design is complex. Errors can accumulate before propagating making it difficult to 
determine the accuracy of the error rate calculation. 
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Proton SEU Results 



Energy (MeV) 

Figure 4: SEFI Cross Sections - A Comparison of Scrubbers. 
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SEFI Cross Section (cm 2 /device) 



Time between interrupt (ms) 

Figure 5: Error Cross-Section whiie Varying Time Between each Interrupt. 
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SEFI Error Cross-Section Calculations 


Because we were performing a design characterization, it was necessary to measure error cross section 
by SEFI over fluence. We ran until SEFI occuranvce therefore the error cross section per device is defined 
as: 


a, 




1 


error 


fluence 
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Conclusion 


When performing a SEU analysis for a complex SOC device it is important to have a full understanding 
of the design(s) under consideration. Device level characterization is theoretically impossible to achieve 
within the time constraints of a mission project (however, achievable within a research environment). 
Therefore it is important to develop a design characterization approach to radiation testing for maximum 
project risk reduction. 

It is necessary to implement a DUT that is a replica (or very close to) the actual design within the project 
under investigation. However, due to the complexity, the design’s state space must be constrained without 
loss of imperative data collection/information. We chose to constrain the ELC Space Cube by: 

(1) using only 2 out the 4 Power PC’s 

(2) Selecting simple software routines that will not mask operation 

(3) Changing the frequency of processing (time to interrupt and constant high speed counting) 

Proton test results illustrate using an external hardened scrubber will reduce error cross section by a 
magnitude of 10. There are many reasons for such results. However, the key is that the ICAP/FRAME ECC 
core is only a SECDED (single error correct double error detect) module. It has been shown that 
configuration memory is subject to multiple bit hits. The external scrubber is capable of correcting any 
number of multiple bit hits as long as DUT internal scrubbing interface, scrubbing logic, scrubbing registers, 
and un-writable configuration bits are not hit. 


To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27, 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 


Acknowledgment 


The Authors gratefully acknowledge support from, ELC, the NASA Electronic Parts and Packaging 
Program (NEPP), the Defense Threat Reduction Agency under IACRO# 07-42071, and Xilinx Corporation. 


References 


[1] “The IBM ASIC/SoC methodology — A recipe for first-time success” 

http://www.research.ibm.com/joumal/rd/466/doerre.html. 

[2] J.W. Howard, et al, “ Development of a Low-Cost and High-Speed Single Event Efftects Testers based 
on Reconfigurable Field Programmable Gate Arrays (FPGA), “SEESYM06, April 2006. 

[3] Melanie Berg”FPGA Design Strategies for the Space Radiation Environment,” SEESYM06, April 2006. 


To be presented by Melanie Berg at the IEEE Nuclear and Space Radiation Effects Conference (NSREC), July 23-27 
2007 and to be published in the 2007 IEEE Radiation Effects Poster Session and on http://rahome.gsfc.nasa.gov 


